07. Quiz: Q-Learning

Quiz: Q-Learning

Say that an agent is learning to navigate the gridworld described earlier in the lesson.

Gridworld Example

Gridworld Example

Suppose the agent is using Q-Learning in its search for the optimal policy, with \alpha=0.1 .

At the end of the 99th episode, the Q-table has the following values:

Q-table

Q-table

Say that at the beginning of the 100th episode, the agent starts in state 1 and selects action right . As a result, it receives reward -1 , and the next state is state 2 .

Beginning of the 100th episode

Beginning of the 100th episode

In the previous video, you learned that at this point in time, the agent updates the Q-table.

Which entry in the Q-table is updated?

SOLUTION: The entry corresponding to **state 1** and **action right**.

What is the new value in the Q-table corresponding to the state-action pair you selected in the answer to the question above?

( Suppose that when selecting the actions for the first two timesteps in the 100th episode, the agent was following the epsilon-greedy policy with respect to the Q-table, with epsilon = 0.4. )

SOLUTION: 6.2